Wave function optimization in the variational Monte Carlo method 
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An appropriate iterative scheme for the minimization of the energy, based on the variational 
Monte Carlo (VMC) technique, is introduced and compared with existing stochastic schemes. We 
test the various methods for the ID Heisenberg ring and the 2D t-J model and show that, with 
the present scheme, very accurate and efficient calculations are possible, even for several variational 
parameters. Indeed, by using a very efficient statistical evaluation of the first and the second energy 
derivatives, it is possible to define a very rapidly converging iterative scheme that, within VMC, 
is much more convenient than the standard Newton method. It is also shown how to optimize 
simultaneously both the Jastrow and the determinantal part of the wave function. 



Since the seminal work of Gutzwiller, a large amount 
of progress has been achieved in the understanding of 
strongly correlated materials by means of correlated wave 
functions (WF). The fractional quantum Hall effecti 
and the theory-still controversial- of high temperature 
superconductors^ are two important among many other 
examples. 

Most correlated WF, here indicated by \ipa), can be 
obtainec&^£ by applying a two body Jastrow factor J to 
a mean- field WF \D) described by a single determinant 
or a superconducting BCS state (SDBCS), so that: 



Hg) = J\D) 



(1) 



where J contains two-body density-density and spin cor- 
relations. For accurate calculations, the inclusion of these 
two-body correlations in tpQ is possible by means of the 
VMC statistical approach. 

However the application of correlated WF to complex 
electronic system o 6 i 7 i 8 , or-e.g.-to describe correctly the 
metal-insulator transition? , requires the use of many vari- 
ational parameters. A drawback that has certainly lim- 
ited the application of VMC to important physical and 
chemical problems with a large number of electrons. 

In order to overcome the above limitations of VMC, in 
this paper we introduce a minimization scheme, that al- 
lows to optimize efficiently the energy expectation value: 



(lpa\tlJa} 



(2) 



where H is the model Hamiltonian considered, a = 
a\,ot2, ■ ■ ■ a,p is a set of p variational parameters appear- 
ing in the WF tpa( x ) = { x \^a)i where \x) denotes a 
configuration with defined electron spins and positions. 
Henceforth the symbol <> indicates the quantum ex- 
pectation value over the state ijj a , so that E a =< H >. 
Within the Newton method (NM), in order to improve 
the variational parameters, E a+J is expanded quadrat- 
ically in 7 and minimized, then the parameters are 
changed a' = a + 7, and the iteration repeated until 
7 is negligible. In the following we describe an iterative 
technique, that is based on a similar strategy, and that, 
within VMC, performs much better than standard NM. 



At each iteration and for each variational parameter ctk 
we defineiS an operator Ok , diagonal in the configuration 
basis, with diagonal elements: 



n . v d a „ip a (x) 



(3) 



In order to simplify the derivation we assume that these 
operators Ok appear in the WF in the simple exponen- 
tial form exp[^ fe a.k(Ok — < Ok >)]• This is certainly the 
case for the Jastrow factor J, where the operators Ok are 
just density-density or spin-spin correlations. Here for 
convenience we have subtracted from each Ok the corre- 
sponding average value < Ok >, which provides an irrel- 
evant multiplicative constant in the WF. For the varia- 
tional parameters corresponding to the uncorrelated part 
l-D), this exponential form is not exactly fulfilled, but is 
valid only for small changes of the variational parameters 
\ip a+7 ) oc expE fc 7*(Ojt- < Ok >)}\4> a ), namely within 
linear order in 7. In such a case, this expression does 
not provide the exact second derivative of the WF with 
respect to 7. Although this information is in principle 
required for the NM, we will show that the accurate de- 
termination of these terms is not really important within 
VMC, because the standard NM is rather inefficient even 
when these second WF derivatives are computed exactly. 
To this purpose we write IV'a+7) m a slightly more gen- 
eral form that depends on a further parameter j3: 

|t/W 7 ) ~ [l + ^T 7fc (O fc - <O k >) (4) 
k 

+ ! 5>fc7fc'(O fc - < Ok >){O k ,- < O k ' >)]\i/> a ) 



k,k' 



As discussed before, this expansion is valid only within 
linear (quadratic) order in 7 for /3 7^ 1 (/3 = 1 Jas- 
trow case). We consider this more general form because 
the value of /? can be used to improve the efficiency of 
the minimization scheme within the VMC. On the other 
hand it is clear that, as the minimum is approached, the 
non linear contribution proportional to f3, as well as the 
higher order ones, become negligible and irrelevant for 
the WF optimization. 
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By substituting expression J2J in Eq.JSJ, the energy 
can be systematically expanded in powers of "/ k : 

AE = - E ^ + \ E ™ + (! + ^ G ^ k ' ( 5 ) 

where: 

S k h ' k ' = <[O fc ,[ff,Ov]]> (6) 

G fe < fe ' = 2<(H- E a )(O k - < O k >){O k >- < O k > >) > 

fk = -d ak E a = -2 < (H - E a )O k > (7) 

In the above equations we have used the hermitian char- 
acter of all the operators involved, implying for instance 
that < O k H >=< HO k >; fk indicate the forces act- 
ing on the variational parameters and vanishing at the 
minimum energy condition, S h ' represent the excitation 
matrix elements corresponding to the operators O k , G k,k 
take into account the remaining contributions appearing 
when the WF is not exact (< H — E a >^ 0), whereas the 
square brackets indicate the commutator. The WF pa- 
rameters can be then iteratively changed a k — * cx k + "fk , 
by minimizing Eq.JSJ whenever Sh + (1 + [3)G is positive 
definite. The minimum energy is obtained for: 

7 = B~H with : (8) 
B = S h + (l+p)G (9) 

If B is not positive definite, the quadratic form (J5J 
is not bounded from below, meaning that higher order 
terms are important in the expansion JJjJ. In this case 
the correction JHJ may lead to a higher energy rather 
than to a lower energy WF. In order to overcome this 
difficulty, we change the matrix B — > B + fiS where S is 
the positive definiteii overlap matrix: 

S k > k ' =< (O fc - < O k >){Oy- < O k , >) > (10) 

Similarly to the previous Stochastic Reconfiguration 
technique^, we use the same matrix S in order to have 
a well defined minimum for the energy AE in all cases. 
This is achieved by imposing the constraint that the lin- 
ear WF change AWF = (\tp a+7 ) — \ip a ))/\ip a \, obtained 
with = in EqQ] cannot be larger than a certain 
amount r. Hence the control parameter r is defined by 
means of the inequality: 

|AWF| 2 < r 2 (11) 

where, from Eas. (|3ITUfl . |AWF| 2 = J2k.k> lklk'S k ^ k ' . 
This constraint allows to work always with a positive def- 
inite matrix B and, for small enough r, the energy is cer- 
tainly lowered by changing the parameters according to 
©. The constant /i > is non zero whenever is non 
positive definite or |AWF| corresponding to © exceeds 
r. In these cases the constant /i is obtained as a Lagrange 
multiplier, namely by minimizing AE + /i|AWF| 2 , with 
the condition |AWF| = r, which is simple to fulfill with 



standard iterative schemes. A similar control parameter 
was used in Ref 6, by adding the identity matrix / scaled 
by adiag > to the Hessian, namely B = Sh+2G+adiagI, 
so that for adiag —> 00 the steepest descent is obtained, 
whereas within our scheme, for \i — > 00 the stochastic re- 
configuration (SR) technique is recovered with B = Sj At 
and At = 1/(1 smalliS 

As it is clear from its definition Q the matrix G is 
zero when {ip a \ coincides with the exact ground state, 
because in this case (ip a \(H — E a ) = 0. Therefore this 
matrix should be very small compared with Sh for a good 
variational WF. This implies that, by changing (3 in Eq|51 
we can obtain iterative methods converging to the mini- 
mum energy with approximately the same small number 
of iterations. As we will see in the following, for the 
optimization of the Jastrow (Jastrow and SDBCS) it is 
particularly convenient to use j3 = — 1 (/3 = Q), so that 
B = Sh + [J.S (B = Sh + G + [iS), with /i determined by 
(|llfl when different from zero. Henceforth the (3 = 1 tech- 
nique is indicated by NM, namely the standard Newton 
method used in Refs.l|a ll2j) . whereas the one defined by 
an appropriate choice of the parameter (3 7^ 1, is named 
SR with Hessian acceleration (SRH). 
Statistical averages in VMC. In VMC the statistical av- 
erages required for each iteration are evaluated over M 
sampled configurations {^ij-named bin-, obtained af- 
ter a small equilibration, and distributed according to 
the square of the WF. We indicate the correspond- 
ing statistical averages with the symbol <<>>, e.g. 
<< O k (x) »= 1/MYsiOkixi), so that for large M 
<< O k (x) » coincides with the exact quantum aver- 
age < O k >, with a statistical accuracy oc -^j- 

After each bin we change the variational parameters 
according to Eq.JSJ), by using the appropriate matrix B 
discussed before. The SRH is stable for small enough r 
and may converge much faster than SR. With a larger bin 
length M, the value of r can be substantially increased, 
leading therefore to a much faster convergence within the 
SRH minimization scheme. 

As also noted in Ref|y it is extremely important 
that the quantities evaluated within the statistical ap- 
proach are obtained by means of fluctuations 8A(x) = 
A(x)— << A(x) >> of suitable variables A(x), that de- 
pend only on the electronic configuration x sampled with 
VMC. Therefore, it is useful to express the forces f k , the 
matrices Sh and S by appropriate fluctuation averages: 

f k = -2 « 5e L (x)50 k (x) » (12) 

S k ' k ' = « 8d ak e L (x)60 k '(x) » +(k <-> k') (13) 

S k > k ' = «50 k (x)50 k ,(x)» (14) 

G h,W = 2«5e L {x)80 k {x)50 k ,{x)» (15) 

where cl{x) = ^°^J^ is the so called local en- 
ergy. In order to derive Ea. <|13() we insert a com- 
pleteness J2 X \ X )( X \ = 1 m tne of Eqs.©, 
and we also make use of the identity (x\HO k \ip a ) = 
(d ak eL(x) + Ok(x)eij(x))il) a (x). Then we notice that 
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<< d ak e L {x) >>= for M — > oof^ so that the 
fluctuations of d ak eL(x) can conveniently appear in 
Ea. (fLl)l . This way to evaluate Sh is affected by much 
smaller statistical errors than the previous estimate^ 
<< d ak eL{x)Ok'(x) >> +(k <-* k'). For the same rea- 
son the matrix B, used in the iteration (JHJ), is affected 
by the large noise present in G. Indeed, for f3 = — f, 
B does not depend on G and the corresponding statisti- 
cal fluctuations are much reduced, especially for a large 
number of electrons N e . In fact the operators Ok and the 
local energy scale with N e , their fluctuations Se^, SOk in 
(|f 5fl with y^ej whereas the corresponding statistical av- 
erages for G are of order N e , because they contribute to 
an extensive energy in JSJ. This implies zero signal to 
noise ratio (~ ,^ ) in the matrix G (or similarly in Sh 

without using the correlation I13fl for large N e compro- 
mising the efficiency of the VMC. On the other hand it 
is clear that both the matrices S and Sh in Ecis. H13I14|) 
are much better behaved, because they are obtained by 
averaging statistically quantities of order N e , that have 
a mean value of the same order. 

Unfortunately for the optimization of \D), uncon- 
trolled divergences in the evaluation of S h ' appear when 
both indexes k and k! correspond to the SDBCS param- 
eters. This is due to the zeros of (x\D), implying that 
d ak eL(x) — —Ok(x)eL(x) and Ok'(x) wildly fluctuate in 
Ea. (|f 3fl because they can both diverge for ip a (x) — > 0. 
The way to overcome this difficulty is to use (3 = in 
Eq.® since the most relevant divergences in Sh are ex- 
actly canceled by G just for this value of (3. In prin- 
ciple the matrix B cannot be evaluated statistically for 
[3 =/= because the statistical uncertainty remains even 
for M — > co, due to these large fluctuations. For (3 = 
and for large number of electrons N e , the problem of zero 
signal to noise ratio in B cannot be avoided for Jastrow 
and SDBCS optimization, but can be substantially al- 
leviated by a good WF choice, because G has the zero 
variance property: G — without noise, whenever tp a is 
an exact eigenstate. 

Results We have tested the simple SR method, the 
presently discussed SRH one, and the recently introduced 
NM 6 ((3 = 1 in our notations) on simple L— sites lattice 
models, the ID Heisenberg model (1DHM) and the t-J 
modeU*, because we believe they represent useful bench- 
marks to compare various techniques. The Jastrow factor 
of the initial WF was set to zero for both models. More- 
over, consecutive configurations in each bin are separated 
by 2 x L Metropolis attempts. 

For the 1DHM we use a WF containing a long range 
Jastrow factor in the form: exp(l/2j2 l j vf^S^ Sj)\D > 
where \D) is a Neel state with magnetization along the 
x— spin axis. Here S are usual spin-1/2 operators and 
projection over zero total spin— z component is also as- 
sumed. The long range Jastrow is essential to destroy 
the long range magnetic order in ID and obtain a sensi- 
ble and accurate ansatai^. By using all spatial symme- 
tries of the model, we remain with L/2 — 1 independent 




# Iterations R 



FIG. 1: Left: convergence of the SRH technique for different 
sizes. Right: the same for all the 149 independent VMC 
parameters for the maximum size studied. For all simulations 
the bin length (the control parameter r defined in Eq. lllH ') 
M was taken proportional to L 2 (L), starting from M = 500 
(r = 2) for L = 30. 



parameters in the Jastrow factor. 

As shown in Fig.0J, with the present SRH technique, a 
converged value of the nearest neighbor v z is obtained in 
less than 10 iterations for all sizes studied (similar conver- 
gence is obtained at all distances, see right panel), imply- 
ing that the use of the matrix Sh {(3 = — 1 has been used 
in this case) is very effective to accelerate the convergence 
to the minimum energy WF. By the SR technique, stabil- 
ity forces a very slow convergence: the required number 
of iterations increases with the system size oc L 2 because 
the model is gapless and accurate calculations become 
prohibitive already for L ~ 100. 

As shown in Fig. @ SRH is remarkably faster than SR. 
After convergence, with much longer runs, SR and NM 
appear > 50 and > 10 times less efficient than SRH re- 
spectively, for obtaining the variational parameter within 
a given statistical error, by performing a statistical aver- 
age, as discussed in RefllCl SR has a much larger cor- 
relation time and NM is affected by larger fluctuations. 
The performances of SRH appear always optimal just for 
[3 = — 1, especially for large N e , due to the argument 
presented below Ea. Q15f) . This value of (3 allows also to 
use small bin length (see Fig[5J and to be very efficient 
far away from convergence. 

Optimization of \D) . We have also tested the SRH tech- 
nique in the 2D t — J model at J/t = 0.4i4 for optimiz- 
ing the parameters contained in the D = \BCS) where 
the BCS Hamiltonian defining the BCS state has a gap 
function = &(cosk x — cosk y ) and a dispersion band 
ek = — 2(cosfc2; + cosky) — At' cos k x cos k y — fi , where 
/io,i' and A are the three variational parameters, that 
are optimized together with all the forty (ten) indepen- 
dent ones contained in the density- density Jastrow fac- 
tor exp(l/2 ^\ j VijUiUj) for L = 242 (L = 50). In this 
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FIG. 2: Convergence of the VMC n.n. spin Jastrow for 
various methods (see text). For the SR technique At J = 
0.125. For the NM the first 4 iterations are with adiag = 0.02, 
for the next 6 adiag ~ 0.002 and the rest with adiag = 0. For 
M — 2500 the NM is unstable after 30 iterations due to a 
big fluctuation of the matrix G (see text) not used in SRH. 
For the SRH technique r = 5 for M = 2500 and r = 10 for 
M = 5000. After the third iteration |AWF| < r is always 
satisfied, even after further 1000 iterations (not shown). 



this way, in the iteration JSJ) , the noisy and computation- 
ally expensive (~ 5 times more) SDBCS matrix elements 

S h ' are replaced by S k ' k /At, whereas all the other ma- 
trix elements of Sh are evaluated including also the ones 
coupling J and \D). For these matrix elements an esti- 
mator different from the symmetric one (I13f) is used for 

S h : S k ' k ' = S k h '- k = 2 « 5d ah e L {x)50 k ,{x) », involv- 
ing only local energy derivatives of Jastrow parameters. 
Afterall suitable values of r and At provide performances 
similar to SRH for the size studied, with a much cheaper 
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discussed before, /3 = has to be used, namely 
B = Sh + G in the iteration scheme JSJ. Moreover, in 
contrast to the Jastrow case, the condition (|llfl cannot 
be always verified with /i = after equilibration, due to 
occasionally large fluctuations in the matrix B. Never- 
theless, the SRH remains stable, by using a small enough 
value of r. In this way, these large fluctuations of B are 
very efficiently damped by condition lllfl . As shown in 
Fig-©i a l so m this case the SRH provides a substantial 
reduction of the iterations required for convergence. As 
expected this is accompanied also by a sizable increase 
of the fluctuations in the parameters especially for the 
chemical potential [1q. Indeed, though the efficiency in- 
creases with L, in favor of SRH. this improvement is not 
as important as in the previous case, for the optimization 
of the Jastrow part alone. 

For the simultaneous optimization of the Jastrow and 
SDBCS parts of the WF, a more efficient scheme is to use 
the SR for the SDBCS parameters and the SRH with j3 = 
— 1 for the remaining ones (hybrid method in FigOJ . In 



FIG. 3: Parameters in \D), while optimizing the Jastrow part 
simultaneously, for the t — J model at J/t = 0.4 with bin M = 
5000. For SRH the control parameter is r — 0.5 (r = 0.1) for 
L = 242 (L = 50), for SR J At = 0.125. The hybrid method 
(HM) is also shown (left) for fj, (r = 1 J At = 1.25). 



computational cost per iteration and with a signal to 
noise ratio certainly stable for large N e , as only S h and 
S (and not G) are used in this case. 

In conclusion, the optimization methods we have de- 
scribed, based on a very efficient VMC evaluation of the 
Hessian matrix, open the possibility to study electronic 
systems with correlated WF containing many variational 
parameters, and with an efficiency that is now compara- 
ble with non statistical methods of some time age 16 . 
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